Using machine learning pipeline to predict entry into the attack zone in football

Sports sciences are increasingly data-intensive nowadays since computational tools can extract information from large amounts of data and derive insights from athlete performances during the competition. This paper addresses a performance prediction problem in soccer, a popular collective sport modality played by two teams competing against each other in the same field. In a soccer game, teams score points by placing the ball into the opponent’s goal and the winner is the team with the highest count of goals. Retaining possession of the ball is one key to success, but it is not enough since a team needs to score to achieve victory, which requires an offensive toward the opponent’s goal. The focus of this work is to determine if analyzing the first five seconds after the control of the ball is taken by one of the teams provides enough information to determine whether the ball will reach the final quarter of the soccer field, therefore creating a goal-scoring chance. By doing so, we can further investigate which conditions increase strategic leverage. Our approach comprises modeling players’ interactions as graph structures and extracting metrics from these structures. These metrics, when combined, form time series that we encode in two-dimensional representations of visual rhythms, allowing feature extraction through deep convolutional networks, coupled with a classifier to predict the outcome (whether the final quarter of the field is reached). The results indicate that offensive play near the adversary penalty area can be predicted by looking at the first five seconds. Finally, the explainability of our models reveals the main metrics along with its contributions for the final inference result, which corroborates other studies found in the literature for soccer match analysis.

The main contribution is to show that the use of complex network features to understand and predict the attack behavior is viable. Another contribution relies on the combination of time series representations with convolutional networks, pre-trained with images, to produce effective predictions. To the best of our knowledge, no study in the literature has addressed this problem using the methodological steps investigated in this paper.

2.
A deep literature review should be given, particularly advanced deep learning models in image processing. Therefore, the reviewer suggests discussing some related works by analyzing the following papers in the revised manuscript.

R:
The actual machine learning applications cover several areas in football data. Our literature review is basically focused on predictions, and some of these works are described in the Section Related Work in the paper. Lines 55-133 provide an overview of studies that investigate predictions in soccer context and Lines 294-302 describe how deep learning has been used to predict the tracking of players.
3. Please clarify the contributions to this field, for example, which are the existing ones, and which are your own ones?

R:
The contributions presented in our project have proven that it is possible to use complex network features to predict if the team with ball possession will reach the attack zone, i.e., will have chances of scoring.
4. How about the computational complexity of the proposed method? R: The complete deep learning network architecture has 6.7 million of trainable parameters, where most of them (approximately 4 millions) belong to EfficientNet (the main backbone used in our formulation). Recall that those parameters (weights) are only fine turned in our trained strategy. Training the used architecture takes one day on a GPU NVIDIA (GTX 1050 Ti with 4 GB of RAM memory) with CPU (AMD FX-8350-e) and 16 GB of RAM memory. Recall that this machine has a very simple configuration, which demonstrates the advantage of using pre-trained models and simplifying our framework. This training procedure is performed off-line.
The other components of the devised pipeline (e.g., construction of graphs, extraction of metrics from complex network, and the build of visual rhythms) take negligible time when compared to the training time of the network architecture.
5. Some future directions should be pointed out in the conclusion. R: Thanks for pointing out this issue. In the current version of the manuscript, we extended the discussion upon possible research directions for future work. Possible next steps include the implementation of graph-convolutional networks to improve the results and make the pipeline smaller. Another possible direction related to incorporating more spatial features related to the attacking and defending team, such as covered area and centroid of the team as Frencken and Lemmink work (FRENCKEN et al., 2011), We believe that the combination of complex network and spatial features may lead to improved prediction results.

Reviewer 3
6. The Information highlighted, and the conceptual methodology, is already known and lots of advanced papers are already published. R: The paper aims to validate a new approach using concepts already consolidated, like graphs to analyze the players and teams behavior, visual rhythm in time series representation and machine learning to predict the play flow. It is possible that there are works using state-of-the-art algorithms, but our focus was to create and test the possibility of combination of graphs, time series representations and machine learning, in a unique pipeline with few data and heterogeneous teams. To the best of our knowledge, no study in the literature has investigated this combination for similar problems.
7. No specific novelty is there. R: No algorithm, or metric was the focus of innovation during the development of this project, our attention was kept to create an innovation in the method of how to apply great tools available to predict the movement of the teams. In this paper, we aimed to create an effective framework that could lead to insights of how the attacking team may approach the penalty box, i.e., may have more chances of scoring. The main novelty of our work relies on the using of a complex network formulation perspective. Other papers analyzed teams using similar metrics to ours, but for different applications and objectives. The main contribution is to show that the use of complex network features to understand and predict the attack behavior is viable. Another contribution relies on the combination of time series representations with convolutional networks, pre-trained with images, to produce effective predictions. To the best of our knowledge, no study in the literature has addressed this problem using the methodological steps investigated in this paper.
8. No System model and Algorithm is highlighted in the proposed approach. R: During the implementation, a neural network was utilized to make the predictions, the focus was to show how that makes it possible to create a pipeline that unify graph, time series, images and deep learning. It describes how a convolutional network is used and the amount of layers between lines 294 and 302.
9. No Real Time case study based discussion. R: The idea to apply the concepts of this work is to consider the historical data of a time, create a tool to enable the understanding of how a team usually attacks and create a strategy to stop them. Real time application looks a great way to use this pipeline, but it can be a possible continuation of the work. Recall that a real-time application of machine learning algorithms on positional data is uncommon, despite its potential. We believe that is in part due to the enormous computational resources that might be needed.

Reviewer 1
10. Case study: given the nature of the paper, a set of use cases with the relative Shap local explanation are required to provide the reader some results from the application of the proposed methodology. A bit of analytics would also improve the paper: who are the players mostly contributing with/without the ball to successful ball possession? What are the network features that define such players? R: Thanks for raining such an issue, Shap-based analysis may provide us the values for each channel of image (where each channel represents a metric). It is possible, therefore, to identify and appoint the most important players. However, in the development of the pipeline, this idea was discarded, because the training was realized with data of different teams. We plan to perform individual-based analyses in future work.
11. Experiment replicability: authors provide both data and code of their work, this is a huge strong point compared with many other papers on the same topic. However, data description is missing, both within the paper and in the data repository. A broader data description would improve this aspect. R: Originally, the description of data was not made available. The data provided is now accompanied with metadata, including for example descriptions of labels and how information is presented in each column: https://figshare.com/articles/dataset/ REDSCAT2/19222746 This organization leverages the replicability of our work and allows the validation of our results.